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Abstract 

Optogenetics is a new tool to stimulate genetically targeted neuronal circuits us- 
ing light flashes that can be delivered at high frequencies. It has shown promise for 
studying neural circuits that are critically involved in normal behavior as well as neu- 
ropsychiatric disorders. The data from experiments in which circuits are stimulated 
by optogenetics presents the statistical challenge of modeling a high frequency point 
process (neuronal spikes) while the input is another high frequency point process 
(light flashes). We propose a new simple probabilistic approach to model the rela- 
tionships between two point processes which employs additive point-process response 
functions on the logit scale. The resulting model, Point-process Responses for Opto- 
genetics (PRO), provides explicit nonlinear transformations to link the input point 
process with the output one. Such response functions may provide important and 
interpretable scientific insights into properties of the biophysical process that govern 
neural spiking in response to optogenetics stimulation. We validate the PRO model 
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via out-of-sample prediction on a large dataset from optogenetics experiments. Com- 
pared with other statistical and biophysical models, the PRO model yields a superior 
area-under-the-curve value as high as 93% for predicting every future spike within a 
5ms interval. Another advantage of the PRO model is its efficient computation via 
standard logistic regression procedures. We also demonstrate by simulation that the 
PRO model performs well for neurons following the classical Leaky Integrate and Fire 
(LIF) model. A spline extension of the PRO approach is also illustrated. Finally, the 
PRO model outputs a few simple parameters summarizing how neurons integrate spe- 
cific patterns of light inputs. To demonstrate this, we use the PRO model to analyze 
simulated data from an LIF model. Comparing these summary parameters enables 
neuroscientists to further study how neural circuits are altered by under various dis- 
ease conditions and/or experimental manipulations. 

Keywords: optogenetics, point processes, generalized linear models, response functions, 
neuronal data, generalized additive models, prediction. 
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1 Introduction 



In 1979, Francis Crick suggested it would be of great value for understanding how the 
brain works to be able to deliver an input to a specific neuron in the brain at a specific 
time while leaving oth er neurons unaltered, and then observe the downstream effects of 
the input (jCricki . LL999J)- Optogenetics is a new technology for accomplishing this goal; it 
works by using genetic tools to make neurons sensit ive to light . The n light flashes can 
deliver input to neurons in a fast and targeted way (IDeisserothl . 120101 ) . One application 
of optogenetics is to study how neural circuits (networks) process their inputs. A neural 
circuit is an ensemble of interconnected neurons that works together to process specific 
kinds of information. For circuits involved in early stages of sensory processing such as 
those in the retina, the way in which the circuits process information was studied before 
the invention of optogenetics by delivering precise patterns of input and using statistical 
models to infer the feat ures of the input t hat the circuit's o utput (i.e., the spike trains 



of the neurons) encode (perry et al. 



1997; 



Keat et al. 



200ll ). However, for circuits that 



are deeper in the brain (e.g., those in the prefrontal cortex involved in working memory, 
attention, cognitive control, and social cognition) precise control of the inputs to the circuits 
has only been made possible recently by the development of optogenetics. We will consider 
specific experiments to underst and the processin g of information in the recurrent layer V 
circuit in the prefrontal cortex (IGee et all |2012| ). Malfunctions of this circuit are thought 
to play a role in schiz ophrenia, a devastating illness that affects approximately 1% of the 



popu lation worldwide (IGoldman-Rakic and Selemonl . 11997 



Barch et al. 



2001 



iLewis et al. 



2005|). 



In our experiments, a brain slice that contains a recurrent layer V circuit from a mouse 
is used, light flashes are delivered to a specific neuron at randomly chosen times and 
the spikes of the neuron are recorded. Neurons that are directly activated by light may 
recruit other excitatory and inhibitory neurons in the circuit, producing a combination 
of excitatory and inhibitory input that continues to reverberate long after the original 
stimulus ends. An example of data from our experiments is shown in Figure [TJ Each line 
shows the stimuli (the light flashes) and outputs (spikes of the neuron) for one run (sweep) 
of the experiment. The scientific questions we seek to answer are the following: (1) what 
role do the presence or absence of recent stimulation (light flashes), the past history of 
stimulation and the past activity of the neuron (i.e., the times of previous spikes) play 
in the probability of a neuron spiking at a given time; (2) how predictable are the spikes 
based on the light flashes, i.e., how much noise is there in how the neurons process their 
inputs; and (3) how do (1) and (2) differ between different types of neurons in the circuit 
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(neurons in the circuit can be subdivided into different types based on morphology and 
electrophysiological properties) and when different drugs are applied to the circuit. In this 
paper, we develop a new statistical model for spike train data to address these questions. 
Our model builds on existing spike train models that were developed for other types of 
experiments but incorporates new elements that are motivated by the specific features of 
optogenetics stimulation of neural circuits. 

Probabilistic approaches have been successfully applied to model neuronal data. One 
particularly useful formulation employs generalized linear models to l ink th e input stimulus 
with t he output spikes, see for example iBrillingerl (119881 ); iPaninskil (120041 ); iTruccolo et al. 
(120051 ) . Such approaches are particularly useful when the spiking patterns are similar 
across trials. This is usually achieved by repeating the stimuli across several runs. Counts 
of spikes within aligned ti me bins are then m o deled by inhomogeneous Poisson processes, 
using large sample theory. iKass and Ventura! ( 120011 ) provided an important model in this 
nature. However, our current experiment employs randomly generated stimuli in order to 
ensure generalizability. More importantly, the output spiking patterns are also very sparse, 
as shown in (TJ and the Poisson proc ess theory no longer h olds in this setting. A Bernoulli 



modification has been suggested by 



Truccolo et al 



(hoosl ) 



for the repeated stimuli design 



context. However, these models were not designed for settings like ours in which the spiking 
behavior and non-repeated stimulus process are sparse and precise in time. 

Along with the sparsity in data, the probability of a n euron firing as a function of 
time elapsed since the stimulus can be highly nonlinear (jBrillinger , 



1988 



Keat et al.l. 



2001 



Kass et al. 



20051 ) . In statistical literature, nonparametric approache s have thus 



been applied to account for such nonlinear underlying processes. IBrillingerl (119881 ) con- 
sid ered higher order polynomials in modeling the underlying probability. Recent reviews 



by 



Kass et al 



(120051 ) suggest that using splines is an appropriate choice. From a statistical 



poin t of view, estimating nonl inear functions from sparse designs are particularly challeng- 



ing. iSeifert and Gasserl ( 119961 ) showed that one needs sophisticated procedures to overcome 
the challenges present in theory and practice, when the desig n points are spars e . Ano ther 
simple remedy by adding additional points was suggested by lHall and Turlachl ( 119971 ). In 
particular, the latter correction however is not available in our experiment, as the input 
processes are randomly generated and the output processes are not controlled by the ex- 
perimenters. The investigator employed randomly generated inputs in order to achieve the 
most generalizability of the models to future inputs. As a consequence, the data provide 
only limited flash patterns as predictors. For example, only limited samples with the same 
or similar flash patterns will be observed, and the observed patterns cover only a small 
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Figure 1: Raster plot of spikes and flashes in 20 sweeps. Red long vertical arrows are 
spikes, and blue short vertical bars with solid circle heads are flashes. 
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portion of all possible patterns that could arise. In this paper, we adopt a very different 
approach to model the outcome instead, first by wrapping the two point processes across 
time; this creates more replicability of the flash patterns. However, even after wrapping, 
similar flash patterns preceding a spike are limited within the data. To address this, we 
construct simple non-linear functions, called point-process response functions, to summa- 
rize the important statistics of these flash patterns. These functions are also shown to have 
important biophysical interpretations. 

Response functions have been previously applied for analyzing and modeling classi- 



Gerstner 


(1995 


); 


Kistler et al. 


(mi-. 



Response Model (SRM ) based on the classic Integrate-and-Fire (IF) model. By integrating 
the IF model, they replaced the i ntegrands by a gener al response kernel function, indepen- 



dent of input. 



Keat et al. 



Geffen et al. 



( 120091 ) constructed linear-nonlinear models 



for spike train data. Their model computes a generating potential by convolving an im- 
pulse response function with the stimuli process. The model outputs the predicted spikes 
by thresholding the up-crossing of the gene rator potential . The i mpulse function an d the 



threshold are then optimized to fit the data. 



Paninski et al 



(120041 ): 



Pillow et al. 



(120051 ) pro- 



posed a kernel function as functional projection of input using a stochastic IF model, and 
estimated the kernel functions using maximum likelihood. However, due to their goals of 
estimating such nonlinear response functions, these models are usually applied to classical 
spike train data where repeated stimuli are applied and aligned. 

In the setting of such sparse responses and inputs, we introduce Point-process Responses 
for Optogenetics (PRO) to parametrically model response functions. This approach has 
several nice features. The simple form avoids overfitting, yet provides highly generalizable 
models. Moreover, this approach also provides additional advantages in interpretation by 
three summary statistics. In addition, the computation is very fast using existing logistic 
regression implementations. Finally and perhaps most importantly, the outputs from this 
model can be related to other biophysical models, and the model coefficients reveals impor- 
tant aspects of how the circuits processes inputs. We will use logistic regression to fit the 
coefficients associated with these response functions, and the coefficients can be interpreted 
as characteristics of how the neuron processes information. The ways in which different 
types of neurons process information or how a drug affects the processing of information 
can be assessed by comparing the coefficients between different neurons or under different 
drug treatments. 

Our approach provides highly accurate prediction of future spikes. Indeed, several 
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pap ers reporte d befo re have measu r ed pr edi ction accurac y of v arious neuronal models 



sec 



Keat et al. 



(120011); 



Pillow et al 



fl2005[ ). 



Jolivet et al 



( 120041 ) studied the prediction 



performance of the nonlinear IF model and the SRM, and a typical prediction accuracy 
is 70-80%. These measures were obtained by building the models and predicting spike 
times under the same stimulus sequence unfortunately. We instead applied a out-of-sample 
evaluation approach, where we aimed to predict what will happen in the future given the 
history up to now. Using this benchmark, the PRO method gives high area-under-the-curve 
(AUC) values for real data, in the high 90%. 

The rest of the paper is organized as follows. We describe the dataset in Section |2j We 
then motivate the forms of the response functions in Section [3J In Section HJ we compare 
the PRO approach with other methods using an optogenetics experiment. We also compare 
these approaches using simulated data from a classic neuronal model in Section [5J and the 
biophysical interpretations of the PRO models are also illustrated. Finally, we discuss 
future directions in Section 



2 An Optogenetics Experiment 



The optogenetic s experiment protocol has been previously described in lSohal et al.l ((20091); 



Gee et al. 



(120121 ) . The input stimuli consist of trains of short flashes of blue light (470nm). 
The presence of a flash within each 5ms interval is coded by a 1 if a flash is present and 
otherwise. The timing of such flashes are randomly generated from independent Bernoulli 
random variables for each 5ms interval. Spiking is also binary coded for each 5ms interval, 
where 1 or indicates the presence or absence of a spike, respectively. Each 5ms interval 
contains at most 1 flash and 1 spike. Thus we obtain two binary sequences of spikes and 
flashes for each run. 

We use 5ms bins because this is a reasonable time resolution to model the behaviors 
of neurons which rarely spike at rates > 100 Hz. We choose to model spikes instead of 
membrane potentials, because this simplifies the analysis and spikes represent the relevant 
output from neurons 

Figure Q] illustrates the raster plot of the spikes and light flashes in an experiment in 
which 20 light trains are delivered on consecutive trials ("sweeps"). We observe 169 spikes 
in the first 10 sweeps, and 181 spikes in the next 10 sweeps. This is a result of 686 flashes 
in the first 10 sweeps and 691 flashes in the next 10 sweeps. The total number of time 
points (5ms bin) for each sweep is 500. Roughly, the neuron spikes after about every 4 
flashes. It is noted that the input sequence for each sweep is different as they are randomly 
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generated, and the intra-flash times could be very different before each spike as well. 

In all of the following analyses, we will first divide the data into two parts to test model 
generalizability. The first 10 sweeps will be used as a training dataset to build models, and 
the last 10 sweeps will be used as a testing dataset to validate the models. 



3 Model 

We will first introduce some basic assumptions to simplify the modeling of these point pro- 
cesses in a probabilistic framework. We will then describe the forms of response functions, 
and relate these to their biophysical interpretations. A spline nonparametric approach will 
also be introduced based on these response functions. 



3.1 Model Assumptions 

To model the effect of the input sequence on the output sequence, we start with the following 
general probability model 

V (s t = lls^, ft) = H t (sl\ ft) (1) 

where s t is the spiking status at time interval t (1 means spike, otherwise), Sq -1 is 
the sequence of spiking statuses from time interval to time interval t (i.e. Sq _1 = 
(s Q , si, . . . , s t -i)), and similarly ft = (fo, fx, ■ ■ ■ , ft) with f t denoting the light stimula- 
tion status at time interval t (1 means light flashing, otherwise). Note that we may 
include the information from f t as a predictor for the response s t , because usually St is 
believed to happen immediately after, to respond to f t , even if they fall in the same time 
interval. 

The general form of the model ([Q) assumes no reproducibility of the neuronal spiking. 
It is less useful in practice because it does not provide information about the mechanisms 
of neuronal firing nor does it provide predictive systems for future neuronal spikes. We 
make two important assumptions to simplify the forms of H t , such that the function can 
be easily estimated from the data. T hese two assumpti o ns ha ve been employed in past 
models analyzing spike train data, see iKass and Ventura! ( 120011 ) for example. We need to 
generalize their assumptions to our data because of the sparse stimulus point process. In 
Section H] and [51 we will use a real data set and simulated data to validate such assumptions 
empirically. 

The first assumption is that the function H t depends only on the time of the last spike 
and the flashes f t back to the last flash time $ before the last spike time t* preceding the 
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time t. That is 



H t {sl~\ f ) = H t (t\ f tt ) 



(2) 



This assumption roughly coincides with our scientific hypothesis that the neurons reset 
their membrane potential to zero after every spike, but we here allow this dependence to 
be slightly longer back to t* because it may contain information on how the sparse inputs 
are integrated in neurons. This is also due to the hypothesis that the sensitivity of neuronal 
integration of input may depend on the spacing of flash times, which is independent of the 
membrane potential reset. In practice, the resulting model fits the data better. We will 
estimate explicitly the dependent function H t from the data. 

In comparison, iKass and Ventura! ( 120011 ) considered a slightly different assumption com- 
paring to Equation (J2J). They replaced t* with £*, and their formulation does not contain 
input point process / at all. Thus, their fo rmulat i on fo r optogenetics data is limited. 



Models, such as the integra te-and-fire model ( iKochl . 119991 ) and the spike response model 
( IGerstner and Kistlerl . 120021 ). usually reset the membrane potential to zero after every spike. 
We found in optogenetics data, the stimulus integration process may be slightly different 
from such simple resetting, and thus using t* instead of t* gives better prediction perfor- 
mance. 

A special case of Equation ([2]) is that it takes an additive form after logit transformation, 
which is our second assumption. That is, we assume 



logit (H t (**, f tt )) =P + J2 ^ (** ' &) 



(3) 



k=i 



where /3o is the intercept and /3k are the coefficients. We use the logit transformation 
because the model can be efficiently computed from the data using standard statistical 
packages. 

The assumption of additive forms furt her simplifies the computati on as well, and it 
is related to generalized additive models (IHastie and Tibshiranil . Il990l ). which is widely 
available. Moreover, the logit transform can be approximated by the logarithm transform 
if the function H t is small, and thus the formation ([3]) takes the similar additive form of a 



special case, multiplicative IMI processes, considered in kass and Ventura ( 2001 ). Additive 
forms t o include the d e pendence on th e stim ulus have also been considered in iBrillinger 



( jl988f ): IPaninskil ( 12004 ): iTruccolo et all ( 120051 ). 

This logit transform can be understood to model sharp rises in firing probability when 
the left hand side of Equation changes linearly. The sharp rising phenomenon is con- 
sidered appropriate for neuronal data in some biophysical mod els as well. For example, the 



linear-nonlinear model (iKeat et al 



2001 



Geffen et al. 



20091 ) employs also a sharp rising 
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non-linear function, similar to the logistic function, to transform the linear components 
related to the stimulus. A critical step of applying these likelihood methods is to provide 

(k) 

or estimate continuous function atoms h\ from the flash point process, and we turn to it 
now. 

3.2 Response Function Forms 

We propose a simple formulation of the summation in Equation [31 which contains the 
following 3 components 



where P is the last flash time. These functions can be constructed parametrically or esti- 
mated non-parametrically from the data. The first component describes the time elapsed 
since the last flash. The second component characterizes the total number of flashes since 
the last spike. The third function measures the spacing of flashes since t*. 

As described before, these three functions can be estimated from the data using GAM. 
However, we consider that approach to be computationally expensive, and the interpreta- 
tion is rather limited. More importantly, it is difficult to compare different neurons using 
these nonparametric functions. Instead, we will first construct these functions explicitly in 
parametric forms. Second stage inferences of comparing neurons characteristics are then 
straightforward, which amounts to testing the differences between these parameters. As a 
complimentary procedure and validity check, we will also compare the parametric functions 
with the nonparametric estimates. 

We now introduce the parametric forms of the functions in Equation (j^J). Let %)'s, for 
j > 1, be reversely ordered previous flash times of Ff = jr : f T = 1, r = . . . , t — 1} since 
the last flash before t*. We set £( ) = t for notation simplicity. The three functions are set 
to be 



To ensure existence of the logarithm transform, a small value 1 is added to the base quanti- 
ties before taking the logarithm. We take the logarithm transform and the log-log transform 




(4) 



h (t - t f ) 



log (1 + 1 - t f ) 
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because they spread out the clusters of points with small values and have interpretations 
of relative changes rather than absolute changes because of the transform. Empirically, 
they also fit the data well, and we will use a GAM model to validate the forms of these 3 
response functions. 

The interpretation for the functions h\ and are straightforward. They are logarithmic 
transformed values of the elapsed time since the last flash and the total number of flashes 
since the last spike respectively. We shall call the fitted coefficients post-flash (PF) and 
cumulative-flash (CF) parameters. The third function h% is measuring the spacing of these 
flashes since £*, conditional on a fixed \F t \ and £ — £*, because of Lemma [TJ We shall call 
the fitted coefficient spread-flash (SF) parameter. Figure [2] shows the schematics of these 
3 functions as responses to example sequences of flashes and spikes. Note that these 3 
functions does not dependent on the current spike status, s (£), and they form valid design 
matrix for predicting the future spiking probability using the existing information, in a 
logistic regression framework (J3]). 

We have the following lemma concerning the values of the SF function. 

Lemma 1. Fix \F t \ and t — £*. The SF function achieves its maximum when £y) = t max 
for all j = 1, . . . ,\F t \ — 1 where £ max denotes either t or t* . The SF function achieves its 
minimum when t(j\ — tr^-i) = \_(t — £*) / \F t \\ if j e S and t^ — ty-i) = \(t — £*) / \F t \~\ 
otherwise, where the set S satisfies \S\ = \F t \ \(t — t$) / \F t \~] — (t — £*) , £(|F t |) = £* , an d 
£(o) = £• 

The implication of Lemma [T] is intuitive. Simply, this function is maximized if all the 
flash times (except £*) equal to each other, and take the value of £' or £. That is, all ft 
except f t t equal to 1 at the same time £ equaling either £^ or £ . Conversely, this function 
achieves its minimum if the spacing between the flash times are approximately as even as 
possible. We here choose the square distance because it has such simple properties and 
it is a widely used metric in statistics . This term can be used to answer a fundamental 
question about how neurons integrate their inputs, namely, do they care about the pattern 
of input or does only the total amount of input matter? 

Because SF measures the spacing of flashes conditioning on the number of flashes, it 
is then viable to consider including the interaction between CF and SF. Indeed, we found 
this interaction term is significant when fitting the experiment data, see Section HJ It is 
also possible to consider even higher order terms of these 3 predictors. For example, one 
may consider higher order terms, and their polynomial interactions. We have examined 
such possibilities up to the third order, but found only the interaction term between CF 
and SF survive forward/backward model selection with the AIC criterion. 
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50 100 150 200 250 300 



Time (ms) 

Figure 2: Schematic plots of 3 response functions (top 3 panels, in green) responding to 
examples sequences (bottom panel) of flashes / (t) and spikes s (t). Red long vertical arrows 
are spikes, blue short vertical bars with solid circle heads are flashes, grey dotted vertical 
lines represent the flash times, and grey solid vertical lines represent the spike times. 
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A variation of this pa rametric PRO mo del is to estimate these functions non-parametrically, 
especially using splines (IKass et all 120051 ) . Given the history of developments in general- 
izing parametric models to non-parametric and semi-parametric ones, we see that this 
approach is a straight forward extension of the ori ginal PRO model. We thus omit the de- 



tailed description on this extension here. Following 



Kass et al. 



(2005), we choose the spline 



bases to fit generalized additive models ( Hastie and Tibshirani . 199oh . and the generalized 
cross validation criterion is used to pick the smoothing parameter, as implemented in the 
R package mgcv. Because PRO discovered interactions between CF and SF parameters, 
we will consider fitting the model with a smoothing plane of CF and SF, in addition to a 
smoothing function of PF. This spline extension of PRO is thus named as PROs. 



4 Analysis of an Optogenetics Experiment 

In this section, we will employ PRO and PROs to summarize the neuronal spiking be- 
havior under an optogenetics experiment. The generalizability of these two appraoches 
will be compared using other methods. Especially, we will compare with two off-the-shelf 
predictive models, random forest (RF) and support vector machine ( SVM), and an impor- 



tant biophys i cal m odel, Generalized Integrate-and-Fire (GIF) model flPaninski e1 



Pillow et al. 



al 



2004 



1999 . 



20051 ). which extends the widely used Integrate-Fire model (IKoch . 
We first apply the PRO model to the first 10 sweeps of the optogenetics experiment 
illustrated in Figured! We then validate the fitted model using the next 10 sweeps. The 
prediction performance is compared with RF, SVM and GIF. 

We use the predictors PF, CF, and SF, which are based on only the history of the 
spike and flash time series, as explained in Section [3l The model fitting output from the 
first 10 runs are shown in Table [TJ This interaction model is selected using forward-back 
variable selection with the AIC criterion, where we initially start from a full model with 
all possible third order interactions. The correlations between these 3 predictors are -0.24, 
0.37, 0.47, where PF and SF has the highest correlation. The deviance R-square of this 
model is 36.1%. This shows that PRO explains moderate variation in the data, and these 
3 predictors are also strongly associated with the spiking process. 

As discussed before, instead of specifying the 3 functions parametrically in the PRO 
model, the PROs model employs splines bases and generalized additive models to estimate 
these functions from the data. Figure [3] plots both the partial regression function of PF 
and the fitted surface on CF and SF with fixed PF at its mean. The partial regression 
plot [3a] confirms that a linear function describes the contribution from PF well in most of 
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Coefficients 


Estimate 


SE 


Z value 


P-value 


Intercept 


-24.310 


4.115 


-5.907 


3.47e-09 


PF 


-1.447 


0.134 


-10.796 


C 2e-16 


CF 


15.719 


2.821 


5.572 


2.52e-08 


SF 


10.347 


2.204 


4.696 


2.66e-06 


CFxSF 


-7.108 


1.507 


-4.716 


2.41e-06 



Table 1: Model fitting output from PRO. The null deviance is 1400.80 (df=4867), and the 
residual deviance is 895.38. 

ranges, except for a drop from the linear line at the origin where PF=0. We think this 
could represent the time for a flash to activate light-sensitive ion channels and thereby cause 
neural depolarization immediately after each flash. Notably, the fact that the coefficient 
for SF is positive suggests that closely spaced light flashes have a synergistic influence 
on neural excitability. The fitted surface [3b] confirms that the main terms of CF and SF 
capture the relationship well, except at the large ends of these two predictors. In there, 
the surface increases to a plateau when both are large, which could indicate that the 
stimulating contribution should be discounted if quite a number of flashes are generated 
also very closely to each other. One possible biophysical interpretation is that the neuron 
may not fully integrate such fast inputs as a result of the inactivation of light-sensitive ion 
channels. This plateau observation is consistent with the negative fitted coefficient of the 
interaction term in the PRO model. 

The spline extension PROs seems to be able to capture very minor deviations from 
PRO, as shown in Figure |3j However, including these does not improve prediction much, 
as we will see below. Because PROs provides somewhat less straight forward output for 
studying the properties of neurons, we recommend this approach only for scientists who 
are comfortable interpreting such spline models. PROs is also recommended for scientists 
who are interested in modeling those slight variations of the neurons from PRO. 

To further validate the model, we use out-of-sample prediction performance as a cri- 
terion. We use the first 10 runs of time sequences as the training data to build models, 
and test the model validity on the next 10 runs. We adopt the popular receiver operating 
characteristic (ROC) criterion as the measure of the prediction performance, because it 
provides a unified framework to evaluate various probabilistic and non-probabilistic meth- 
ods, as we will compare to. If the model approximates the underlying firing mechanisms 
of the neuron, we believe that we will have good prediction power when forecasting future 
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(a) Partial regression function of PF (b) Fitted surface on CF and SF 



Figure 3: Plots of partial regression function of PF (left panel), and fitted surface (in 
logits) on CF and SF with fixed PF value at its mean (right panel), using the first 10 
sweeps. 

spikes based on the past history. For comparison purposes, the same procedure is carried 
out for RF, SVM and GIF procedures. In order to show the advantages of the designed 
functions PF, CF, and SF in the PRO model, we use only the standard predictors in other 
neural spiking models: t — t\ X/r=t* f' r ' an< ^ t — t*. The R implementations of RF and SVM 
are employed, and GIF is implemented in Matlab by its authors. 

The predicted spiking probabilities of PRO, PROs, RF, and SVM are compared with the 
true spiking sequence. The GIF model provides instead the estimated membrane potentials 
from a sequence of stimulus and spike history. Because the neural firing depends on the 
highest membrane potentials within each 5ms interval, we will substitute the maximum 
estimated membrane potential as the predicted quantities. 

Figure H] shows the ROC plot comparing these statistical and biophysical models. PRO 
and PROs clearly outperform all other models in both specificity and sensitivity. Their 
corresponding area-under-the-curve (AUC) values are 93% and 94% respectively, obviously 
higher than all other models. This result also shows that the two PRO models are advanta- 
geous for modeling the optogenetics data. PRO has slightly better prediction performance 
than PROs. We recommend the PRO model for scientifically-oriented users, because of 
its clear advantage in interpretation. GIF is the next best predictive model (AUC=88%), 
but its interpretation is limited by its complexity of modeling. The GIF model involves 
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Figure 4: Plot of the ROC curves of all models. The AUC values are as follows: PRO=93%, 
PROs=94%, RF=86%, SVM=78%, GIF=88%. 



estimating stimulus filter and spike-history filter functions among other parameters, with 
tens of model parameters in total, and these parameters are correlated estimates as they 
represent temporal effects of inputs and outputs. As a result, it is difficult to assess how 
the flash patterns drive the spike outputs by comparing these many correlated estimates. 
For example, it is rather involved to use the GIF model to answer the question whether 
closely or sparsely spaced spikes would encourage more spikes. On the other hand, the 
PRO coefficients provide simple and direct answers. 
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5 Simulations 



In this section, we use simulated data to illustrate how well the PRO models can capture 
neural behaviors and predict future spikes. To provide interpretability of PRO, we also 
explore how the PRO coefficients relate to the biophysical parameters of the simulation 
model. The PRO approaches are also compared with RF, SVM and GIF. Generalizability, 
biophysical interpretation and computation speed of these models will be emphasized in 
the comparisons. 



In a 



1 the s imula tion examples, we simulate data from the Leaky Integrate-Fire (LIF) 



model ( jKochl . Il999l ). which has become a canonical model in cellular neuroscience. It 



employs the following differential equation to model the membrane potential, and assumes 
spikes occur when this potential crosses a fixed threshold, 

where V (t) and / (t) are the membrane potential and stimulus processes respectively, C 
and R are model parameters representing the membrane capacitance and resistance, re- 
spectively. The resulting spike processes are given by putting a spike at the first time that 
V (t) reaches a pre-set threshold V t h, and then resetting V (t) to a small constant V reset 
immediately after the spike. We set the thresholding and resetting membrane potentials to 
be the conventional choice V t h = 1 and V reset = 0. The times £'s at which the threshold is 
reached are also recorded. / (t) is set to a box-shape stimulus, representing the flash pulses 
in our experiment. To simulate data from the model ([5j), we use 1/100 of our sampling 
rate (5ms) as step size, and then the first-order method for simulating from Equation ([5]) 
is sufficient for our purposes. 

Each simulated dataset is generated from the LIF model ([5]) for a period of 25s, at 
the same sampling rate of 5ms of our real experiment. That is, each dataset outputs two 
sequences of 5,000 time points, one for the flashes and the other for the spikes. These have 
the same dimension as the first 10 sweeps of the real experiment. The same length input 
sequence / (t) is randomly generated from iid Bernoulli(0.14), which indicate the 5ms bins 
when the box-shape stimulus (with height 1) is on. We set C = 7 and R = 3, because 
these values yield spiking sequences with similar summary statistics as our real experiment. 
These choices correspond to a membrane time constant of 21 msec. 

Because PRO can be efficiently computed, we first fit the model to 10,000 replications 
of simulation runs. Table [2] shows the frequencies of significant (at level 0.05) coefficients 
in PRO. The mean deviance R-squares of PRO fitting is 0.5681, with a standard error 
0.0002. Even though our model are specifically designed for the optogenetics experiments, 
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Coefficient for 


PF 


CF 


SF 


CFxSF 


%(p-value<0.05) 


88% 


100% 


100% 


100% 



Table 2: Frequencies of significant coefficients in our model at level 0.05. The standard 
error is no larger than 0.01 for all entries. 

these results confirm that PRO captures most of the spiking patterns from the classical LIF 
model. All the coefficients except for PF are significant for almost all the runs, where PF 
is significant over half of the runs. This may indicate that there are limited applicability 
of LIF to the optogenetics data, as this coefficient is highly significant in the real dataset 
with good predictive power. To summarize, PRO has good power of extracting non-zero 
coefficients, which is sensitive to the simulation model, LIF. 

To gain scientific insights of these fitted coefficients, we perform replicate the previous 
simulation 100 times with the varying C or R scenarios. Because large changes of C or 
R may cause neuron not to fire at all during the whole observation period, we consider 
11 relative changes from 80% to 120% of C and R respectively. In each scenario, we plot 
the fitted PRO coefficients of PF, CF, SF, and CFxSF respectively against the changing 
parameter (C or R). A small fraction of insignificant coefficients are dropped from the plots, 
because PRO is not guaranteed to have 100% power in all scenarios, especially for the PF 
coefficient as we show in Table [2j The dropped percentages of coefficients are 9.27% PF in 
the varying C scenario, 8.00% PF and 0.09% SF in the varying R scenario. Simple linear 
regression is employed to illustrate the relationship between these LIF parameters and the 
PRO coefficients. The trend of changes in PRO coefficients is clear except for PF, though 
all the slopes are significant at level 5 x 10~ 5 or less. These trends also reveal important 
scientific insights. For example, when R increases, the voltage leak decreases, and thus 
the rate of increasing spike probability due to increasing CF (more flashes) increases, and 
the rate of increasing spike probability due to increasing SF (more closely paced flashes) 
increases. 

We also compare the spike-to-spike prediction performance using the same LIF model 
as before. In each replication, we simulate 10,000 time points of flashes and spikes, where 
the first 5,000 points are used as the training dataset and the remaining 5,000 points are 
used as the testing dataset. Similar to the analysis of the real dataset, we build different 
models using the training data and compare its prediction performance on the testing data 
measured by AUC. Table [3] compares the average (SE) AUC values of different methods 
over 100 runs. Both PRO and PROs clearly outperforms all other methods, having the 
same best prediction performance and the smallest SE. The second best method is GIF 
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5.5 6.0 6.5 7.0 7.5 8.0 8.5 



c 

Figure 5: Plots of the PRO coefficients against the varying C parameter in the LIF model. 
The linear regression slopes for each PRO function are 0.18 (PF), -1.04 (CF), 0.29 (SF), 
0.41 (CFxSF), and all slopes are significant at level 10 -6 or less. 
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Figure 6: Plots of the PRO coefficients against the varying C parameter in the LIF model. 
The linear regression slopes for each PRO function are -0.28 (PF), 5.98 (CF), 1.68 (SF), 
-2.62 (CFxSF), and all slopes are significant at level 5 x 10~ 5 or less. 
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Method 


PRO 


PROs 


RF 


SVM 


GIF 


Average(SE)% 


97.50(0.02) 


97.41(0.03) 


92.22(0.19) 


92.29(1.54) 


94.20(0.85) 



Table 3: Comparison of average (SE) AUC values (in percentages) over 100 runs. The 
best performance values are highlighted in bold. 



Method 


PRO 


PROs 


RF 


SVM 


GIF 


Fitting 


0.195(0.003) 


4.959(0.603) 


1.624(0.005) 


1.234(0.007) 


469.80(5.49) 


Prediction 


0.097(0.0001) 


2.705(0.014) 


0.161(0.0004) 


0.223(0.0007) 


3135.95(43.44) 


Total 


0.293(0.003) 


7.664(0.604) 


1.784(0.005) 


1.457(0.007) 


3605.75(47.54) 



Table 4: Comparison of average (SE) computation times in seconds over 100 runs. The 
best performed values are highlighted in bold. 



with also the second largest SE. 

Table H] lists the computation times needed to fit the model and generate prediction for 
all methods. The computation is carried out on a machine with CPU Intel Core 2 6600 at 
2.4GHz and 8 Gb of memory. PRO is particularly advantageous when it takes only a small 
fraction of computation time of other methods, and RF and SVM have almost identical 
computation time. PROs is the fourth fastest, while GIF is the fifth fastest. 



6 Discussion 

This paper introduces a simple probabilistic framework for modeling optogenetics data. 
The PRO approach draws on successes of existing probabilistic framework for modeling 
spike train data. This model, however, introduces explicit point-process responses to ad- 
dress the challenges for modeling such sparse data. As of such, this model is shown to 
be advantageous in generalizability, interpretation and computation. A spline extension, 
PROs, is also discussed, and is shown to have a similar performance. 

An interesting direction is to test the decoding capability of our model. We can sim- 
ply modify our probability framework to capture the flash probability at each time given 
the history of spikes and flashes. This can be done by simply switching the notation 
s (t) and / (t) in the general formulatio n ([!]) and r e lated equations afterwards. Similar 



considerations, called reverse regression ( jKass et all 12005k exist for spike train data. It 



i s also interesting to st udy the Bayesian decoding methods based on an GLM encoding 
( Ahmadian et al. . 201 lh . 



21 



The PRO models can be generalized for data involving multiple neurons, by extending 
the single response logistic model to a multivariate response logistic model. The predictors 
are then combined design matrices across multiple neurons, and the response variables are 
then the spiking status of multiple neurons at the same time. The fitted coefficients then 
have a natural interpretation in terms of how one neuron integrates the synaptic inputs 
from other neurons in a network. The model can be fitted using maximum likelihood 
estimation procedures as well, because the joint likelihood criterion is concave. 

We are also interested in exploring additional biophysical meaning of the PRO coeffi- 
cients. One interesting direction is to include some feedback component in the model. As 
we see from the PROs fit, the PF coefficient may deviate from the linear form, if a flash 
and spike occur within the same period. Addition predictors capturing this can be included 
in the model to address this issue. It is interesting to explore their biophysical meaning as 
well. 

It is also interesting t o deve l op state space model s for predicting the spike times. 

Paninski et al. provide important directions in 



Kobayashi and Shinomot 



3 (H); 



analyzing th e spike train d a ta, an d it will be interesting to incorporate their approaches in 
our models. Behseta et ali ( 2005 ) also provided an important hierarchical model for spike 



train data, and it is very interesting to extend that work for optogenetics data. 



7 Appendix 

7.1 Proof of Lemma Q] 

Proof. Denote a fixed number K = \F t \. Let the general intra-flash time be non- negative 
integers cij = tcj\ — t(j-i) > for j = 1, . . . , K, where the first one is the time past the 
last flash t — t(iy It is equivalent to finding he maximum and minimum over a/s for the 
following objective function 

\Ft\ K 

^(«) = E( t o-D- t o)) 2 = E a ? ( 6 ) 

3=1 3=1 

under a fixed total sum constraint M = Ylf=i a j = t ~ $ an d the non-negative constraint 
clj > for j = 1, . . . , K. 

To derive the maximum of the objective function ([6]), the Cauchy-Schwartz inequality 
yields that 

K f K \ / K \ 

, 2 



E«?< (X>) (X>) 

j=l \j=l / \j=l / 



( t -t*y 
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It is straightforward to see that this maximum is achieved when one and only one aj equals 
to t — P. This is equivalent to the configuration of t(j)'s as stated. 

To derive the minimum of the objective function ([6]), the Holder's inequality yields the 
following inequality holds 

j=l \j=l J 

where the lower bound is achieved under the solution 

a* = M/K. (7) 

To obtain an integer solution a', we seek the inter grid points that are the closest to a* 
under the Euclidean distance, whose coordinates are integers of either \_M/K\ or \M/K~\ 
satisfying the sum constraint M. □ 
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